Improving target language modeling techniques for statistical machine translation

نویسنده

  • Maxim Khalilov
چکیده

The aim of this study is to find ways of improving target language modeling (TLM) applied to statistical machine translation (SMT). We describe current research activities dedicated to TLM improvement that are applied to the 2007 n-gram-based statistical machine translation system developed in the TALP Research Center at the Technical University of Catalonia (UPC). We consider two new language modeling improvement techniques: threshold-based TLM pruning and TLM based on statistical classes. Some of the research is still in progress. In this paper we describe some of the major problems faced and outline possible solutions and plans for future research. We describe the results for the SpanishEnglish and English-Spanish language pairs from the official TC-STAR 1 2006 evaluation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Improving Translation to Morphologically Rich Languages (Améliorer la traduction des langages morphologiquement riches) [in French]

Améliorer la traduction des langages morphologiquement riches While statistical techniques for machine translation have made significant progress in the last 20 years, results for translating to morphologically rich languages are still mixed versus previous generation rule-based systems. Current research in statistical techniques for translating to morphologically rich languages varies greatly ...

متن کامل

Using Related Languages to Enhance Statistical Language Models

The success of many language modeling methods and applications relies heavily on the amount of data available. This problem is further exacerbated in statistical machine translation, where parallel data in the source and target languages is required. However, large amounts of data are only available for a small number of languages; as a result, many language modeling techniques are inadequate f...

متن کامل

Topic and Sentiment in Phrase-based Statistical Machine Translation

In this paper, we model two textual properties, topic and sentiment, at the sentence and document levels, with the goal of improving the performance of machine translation by taking into account this information in source and target sentences. In the topical similarity approach, we augment the source sentence with the keywords extracted from its adjacent sentences and re-rank the candidate targ...

متن کامل

Integration of ASR and machine translation models in a document translation task

This paper is concerned with the problem of machine aided human language translation. It addresses a translation scenario where a human translator dictates the spoken language translation of a source language text into an automatic speech dictation system. The source language text in this scenario is also presented to a statistical machine translation system (SMT). The techniques presented in t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007